Matrix exponential

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. Abstractly, the matrix exponential gives the connection between a matrix Lie algebra and the corresponding Lie group.

Let X be an n×n real or complex matrix. The exponential of X, denoted by eX or exp(X), is the n×n matrix given by the power series

e^\mathbf{X} = \sum_{k=0}^\infty{1 \over k!}\mathbf{X}^k.

The above series always converges, so the exponential of X is well-defined. Note that if X is a 1×1 matrix the matrix exponential of X is a 1×1 matrix consisting of the ordinary exponential of the single element of X.

Contents

Properties

Let X and Y be n×n complex matrices and let a and b be arbitrary complex numbers. We denote the n×n identity matrix by I and the zero matrix by 0. The matrix exponential satisfies the following properties:

Linear differential equation systems

Main article: matrix differential equation

One of the reasons for the importance of the matrix exponential is that it can be used to solve systems of linear ordinary differential equations. The solution of

 \frac{d}{dt} y(t) = Ay(t), \quad y(0) = y_0,

where A is a constant matrix, is given by

 y(t) = e^{At} y_0. \,

The matrix exponential can also be used to solve the inhomogeneous equation

 \frac{d}{dt} y(t) = Ay(t) %2B z(t), \quad y(0) = y_0.

See the section on applications below for examples.

There is no closed-form solution for differential equations of the form

 \frac{d}{dt} y(t) = A(t) \, y(t), \quad y(0) = y_0,

where A is not constant, but the Magnus series gives the solution as an infinite sum.

The exponential of sums

We know that the exponential function satisfies ex + y = exey for any real numbers (scalars) x and y. The same goes for commuting matrices: If the matrices X and Y commute (meaning that XY = YX), then

e^{X%2BY} = e^Xe^Y. \,

However, if they do not commute, then the above equality does not necessarily hold, in which case we can use the Baker–Campbell–Hausdorff formula to compute eX + Y.

The converse is false: the equation eX + Y = eXeY does not necessarily imply that X and Y commute. However, the converse is true if X and Y contain only algebraic numbers and their size is at least 2×2 (Horn & Johnson 1991, pp. 435–437).

For Hermitian matrices there are two notable theorems related to the trace of matrix exponentials:

Golden–Thompson inequality

If A and H are Hermitian matrices, then

\operatorname{tr}\exp(A%2BH) \leq \operatorname{tr}(\exp(A)\exp(H)). [1]

Note that there is no requirement of commutativity. There are counterexamples to show that the Golden–Thompson inequality cannot be extended to three matrices. However, the next theorem accomplishes this in a way.

Lieb's theorem

For a fixed Hermitian matrix H the function

 f(A) = \operatorname{tr} \exp \left (H %2B \log A \right)

is concave on the cone of positive-definite matrices. [2]

The exponential map

Note that the exponential of a matrix is always an invertible matrix. The inverse matrix of eX is given by eX. This is analogous to the fact that the exponential of a complex number is always nonzero. The matrix exponential then gives us a map

\exp \colon M_n(\mathbb C) \to \mathrm{GL}(n,\mathbb C)

from the space of all n×n matrices to the general linear group of degree n, i.e. the group of all n×n invertible matrices. In fact, this map is surjective which means that every invertible matrix can be written as the exponential of some other matrix (for this, it is essential to consider the field C of complex numbers and not R).

For any two matrices X and Y, we have

 \| e^{X%2BY} - e^X \| \le \|Y\| e^{\|X\|} e^{\|Y\|},

where || · || denotes an arbitrary matrix norm. It follows that the exponential map is continuous and Lipschitz continuous on compact subsets of Mn(C).

The map

t \mapsto e^{tX}, \qquad t \in \mathbb R

defines a smooth curve in the general linear group which passes through the identity element at t = 0. In fact, this gives a one-parameter subgroup of the general linear group since

e^{tX}e^{sX} = e^{(t%2Bs)X}.\,

The derivative of this curve (or tangent vector) at a point t is given by

\frac{d}{dt}e^{tX} = Xe^{tX} = e^{tX}X. \qquad (1)

The derivative at t = 0 is just the matrix X, which is to say that X generates this one-parameter subgroup.

More generally,(R.M. Wilcox 1966)

\frac{d}{dt}e^{X(t)} = \int_0^1 e^{\alpha X(t)} \frac{dX(t)}{dt} e^{(1-\alpha) X(t)}\,d\alpha.

Taking in above expression e^{X(t)} outside the integral sign and expanding the integrand with the help of the Hadamard lemma one can obtain the following useful expression for the derivative of matrix exponent:

\left(\frac{d}{dt}e^{X(t)}\right)e^{-X(t)} = \frac{d}{dt}X(t) %2B \frac{1}{2!}[X(t),\frac{d}{dt}X(t)] %2B \frac{1}{3!}[X(t),[X(t),\frac{d}{dt}X(t)]]%2B\cdots

The determinant of the matrix exponential

It can be shown that for any complex square matrix, the following identity holds:

 \det (e^A)= e^{tr(A)}.

In addition to providing a computational tool, this formula shows that a matrix exponential is always an invertible matrix. This follows from the fact the right hand side of the above equation is always non-zero, and so \det (e^A) \neq 0 which means that e^A must be invertible. Another observation is the following: in the real-valued case, we see that the map

\exp \colon M_n(\mathbb R) \to \mathrm{GL}(n,\mathbb R)

is not surjective (this is in contrast with the complex case mentioned earlier). This follows from the fact that (for real-valued matrices) the right hand side of the above equation is always positive while there exist invertible matrices with a negative determinant.

Computing the matrix exponential

Finding reliable and accurate methods to compute the matrix exponential is difficult, and this is still a topic of considerable current research in mathematics and numerical analysis. Both Matlab and GNU Octave use Padé approximant.[3][4] Several methods are listed below.

Diagonalizable case

If a matrix is diagonal:

A=\begin{bmatrix} a_1 & 0 & \ldots & 0 \\
0 & a_2 & \ldots & 0  \\ \vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & a_n \end{bmatrix},

then its exponential can be obtained by just exponentiating every entry on the main diagonal:

e^A=\begin{bmatrix} e^{a_1} & 0 & \ldots & 0 \\
0 & e^{a_2} & \ldots & 0  \\ \vdots & \vdots & \ddots & \vdots \\
0 & 0 & \ldots & e^{a_n} \end{bmatrix}.

This also allows one to exponentiate diagonalizable matrices. If A = UDU^{-1} and D is diagonal, then e^A = U e^D U^{-1}. Application of Sylvester's formula yields the same result.

Nilpotent case

A matrix N is nilpotent if Nq = 0 for some integer q. In this case, the matrix exponential eN can be computed directly from the series expansion, as the series terminates after a finite number of terms:

e^N = I %2B N %2B \frac{1}{2}N^2 %2B \frac{1}{6}N^3 %2B \cdots %2B \frac{1}{(q-1)!}N^{q-1}.

Generalization

When the minimal polynomial of a matrix X can be factored into a product of first degree polynomials, it can be expressed as a sum

X = A %2B N \,

where

This is the Jordan–Chevalley decomposition.

This means that we can compute the exponential of X by reducing to the previous two cases:

e^X = e^{A%2BN} = e^A e^N. \,

Note that we need the commutativity of A and N for the last step to work.

Another (closely related) method if the field is algebraically closed is to work with the Jordan form of X. Suppose that X = PJP −1 where J is the Jordan form of X. Then

e^{X}=Pe^{J}P^{-1}.\,

Also, since

J=J_{a_1}(\lambda_1)\oplus J_{a_2}(\lambda_2)\oplus\cdots\oplus J_{a_n}(\lambda_n),

\begin{align}
e^{J} & {} = \exp \big( J_{a_1}(\lambda_1)\oplus J_{a_2}(\lambda_2)\oplus\cdots\oplus J_{a_n}(\lambda_n) \big) \\
& {} = \exp \big( J_{a_1}(\lambda_1) \big) \oplus \exp \big( J_{a_2}(\lambda_2) \big) \oplus\cdots\oplus \exp \big( J_{a_k}(\lambda_k) \big).
\end{align}

Therefore, we need only know how to compute the matrix exponential of a Jordan block. But each Jordan block is of the form

J_{a}(\lambda) = \lambda I %2B N \,

where N is a special nilpotent matrix. The matrix exponential of this block is given by

e^{\lambda I %2B N} = e^{\lambda}e^N. \,

Alternative

If P and Qt are nonzero polynomials in one variable, such that P(A) = 0, and if the meromorphic function

f(z)=\frac{e^{t z}-Q_t(z)}{P(z)}

is entire, then

e^{t A} = Q_t(A).

To prove this, multiply the first of the two above equalities by P(z) and replace z by A.

Such a polynomial Qt can be found as follows. Let a be a root of P, and Qa,t the product of P by the principal part of the Laurent series of f at a. Then the sum St of the Qa,t, where a runs over all the roots of P, can be taken as a particular Qt. All the other Qt will be obtained by adding a multiple of P to St. In particular St is the only Qt whose degree is less than that of P.

Consider the case of a 2-by-2 matrix

A:=\begin{bmatrix}
a & b \\
c & d \end{bmatrix}.

The exponential matrix e^{tA} is of the form e^{tA}=s_0(t)\,I%2Bs_1(t)\,A. (For any complex number z and any \mathbb{C}-algebra B we denote again by z the product of z by the unit of B.) Let \alpha and \beta be the roots of the characteristic polynomial

X^2-(a%2Bd)\ X%2B ad-bc. \,

Then we have

s_0(t)=\frac{\alpha\,e^{\beta t}
-\beta\,e^{\alpha t}}{\alpha-\beta},\quad 
s_1(t)=\frac{e^{\alpha t}-e^{\beta t}}{\alpha-\beta}\quad

if \alpha\not=\beta, and

s_0(t)=(1-\alpha\,t)\,e^{\alpha t},\quad 
s_1(t)=t\,e^{\alpha t}\quad

if \alpha=\beta.

In either case, writing:

s = \frac{\alpha %2B \beta}{2}=\frac{\operatorname{tr} A}{2},

and

q = \frac{\alpha-\beta}{2}=\pm\sqrt{-\det\left(A-s I\right)},
s_0(t) = e^{s t}\left(\cosh (q t) - s \frac{\sinh (q t)}{q t}\right),\quad s_1(t) =e^{s t}\frac{\sinh(q t)}{q t},

where

\frac{\sinh 0}{0} is 0 if t = 0, and 1 if q = 0.

The polynomial S_t can also be given the following "interpolation" characterization. Put e_t(z):=e^{tz}, n:=\deg P. Then S_t is the unique degree <n polynomial which satisfies S_t^{(k)}(a)=e_t^{(k)}(a) whenever k is less than the multiplicity of a as a root of P.

We assume (as we obviously can) that P is the minimal polynomial of A.

We also assume that A is a diagonalizable matrix. In particular, the roots of P are simple, and the "interpolation" characterization tells us that S_t is given by the Lagrange interpolation formula.

At the other extreme, if P=(X-a)^n, then

S_t=e^{at}\ \sum_{k=0}^{n-1}\ \frac{t^k}{k!}\ (X-a)^k.

The simplest case not covered by the above observations is when P=(X-a)^2\,(X-b) with a\not=b, which gives

S_t=e^{at}\ \frac{X-b}{a-b}\ \Bigg(1%2B\left(t%2B\frac{1}{b-a}\right)(X-a)\Bigg)%2Be^{bt}\ \frac{(X-a)^2}{(b-a)^2}\quad.

via Laplace transform

As above we know that the solution to the system linear differential equations given by \frac{d}{dt} y(t) = Ay(t), y(0) = y_0 is y(t) = e^{At} y_0. Using the Laplace transform, letting Y(s) = \mathcal{L}\{y\}, and applying to the differential equation we get

sY(s) - y_0 = AY(s) \Rightarrow (sI - A)Y(s) = y_0

where I is the identity matrix. Therefore y(t) = \mathcal{L}^{-1}\{(sI-A)^{-1}\}y_0. Thus it can be concluded that e^{At} = \mathcal{L}^{-1}\{(sI-A)^{-1}\}. And from this we can find e^A by setting t = 1.

Calculations

Suppose that we want to compute the exponential of

B=\begin{bmatrix}
21 & 17 & 6 \\
-5 & -1 & -6 \\
4 & 4 & 16 \end{bmatrix}.

Its Jordan form is

J = P^{-1}BP = \begin{bmatrix}
4 & 0 & 0 \\
0 & 16 & 1 \\
0 & 0 & 16 \end{bmatrix},

where the matrix P is given by

P=\begin{bmatrix}
-\frac14 & 2 & \frac54 \\
\frac14 & -2 & -\frac14 \\
0 & 4 & 0 \end{bmatrix}.

Let us first calculate exp(J). We have

J=J_1(4)\oplus J_2(16) \,

The exponential of a 1×1 matrix is just the exponential of the one entry of the matrix, so exp(J1(4)) = [e4]. The exponential of J2(16) can be calculated by the formula exp(λI + N) = eλ exp(N) mentioned above; this yields[5]


\begin{align}
\exp \left( \begin{bmatrix} 16 & 1 \\ 0 & 16 \end{bmatrix} \right) 
& = e^{16} \exp \left( \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \right) \\[6pt]
& = e^{16} \left(\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} %2B \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} %2B {1 \over 2!}\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} %2B \cdots \right)
= \begin{bmatrix} e^{16} & e^{16} \\ 0 & e^{16} \end{bmatrix}.
\end{align}

Therefore, the exponential of the original matrix B is

 
\begin{align}
\exp(B) 
& = P \exp(J) P^{-1} 
= P \begin{bmatrix} e^4 & 0 & 0 \\ 0 & e^{16} & e^{16} \\ 0 & 0 & e^{16}  \end{bmatrix} P^{-1} \\[6pt]
& = {1\over 4} \begin{bmatrix}
   13e^{16} - e^4 & 13e^{16} - 5e^4 & 2e^{16} - 2e^4 \\
   -9e^{16} %2B e^4 & -9e^{16} %2B 5e^4 & -2e^{16} %2B 2e^4 \\
   16e^{16}       & 16e^{16}        & 4e^{16} 
\end{bmatrix}.
\end{align}

Applications

Linear differential equations

The matrix exponential has applications to systems of linear differential equations. (See also matrix differential equation.) Recall from earlier in this article that a differential equation of the form

 \mathbf{y}' = C\mathbf{y}

has solution eCty(0). If we consider the vector

 \mathbf{y}(t) = \begin{pmatrix} y_1(t) \\ \vdots \\y_n(t) \end{pmatrix}

we can express a system of coupled linear differential equations as

 \mathbf{y}'(t) = A\mathbf{y}(t)%2B\mathbf{b}(t).

If we make an ansatz and use an integrating factor of eAt and multiply throughout, we obtain

e^{-At}\mathbf{y}'-e^{-At}A\mathbf{y} = e^{-At}\mathbf{b}
e^{-At}\mathbf{y}'-Ae^{-At}\mathbf{y} = e^{-At}\mathbf{b}
 \frac{d}{dt} (e^{-At}\mathbf{y}) = e^{-At}\mathbf{b}.

The second step is possible due to the fact that if AB=BA then e^{At}B=Be^{At}. If we can calculate eAt, then we can obtain the solution to the system.

Example (homogeneous)

Say we have the system

\begin{matrix}
x' &=& 2x&-y&%2Bz \\
y' &=&   &3y&-1z \\
z' &=& 2x&%2By&%2B3z \end{matrix}

We have the associated matrix

M=\begin{bmatrix}
2 & -1 &  1 \\
0 &  3 & -1 \\
2 &  1 &  3 \end{bmatrix}

The matrix exponential

e^{tM}=\begin{bmatrix} 
    e^{2t}(1%2Be^{2t}-2t)  & -2te^{2t}    &  e^{2t}(-1%2Be^{2t}) \\
   -e^{2t}(-1%2Be^{2t}-2t) & 2(t%2B1)e^{2t} & -e^{2t}(-1%2Be^{2t}) \\
    e^{2t}(-1%2Be^{2t}%2B2t) & 2te^{2t}     &  e^{2t}(1%2Be^{2t})  \end{bmatrix}

so the general solution of the system is

\begin{bmatrix}x \\y \\ z\end{bmatrix}=
C_1\begin{bmatrix}e^{2t}(1%2Be^{2t}-2t) \\-e^{2t}(-1%2Be^{2t}-2t)\\e^{2t}(-1%2Be^{2t}%2B2t)\end{bmatrix}
%2BC_2\begin{bmatrix}-2te^{2t}\\2(t%2B1)e^{2t}\\2te^{2t}\end{bmatrix}
%2BC_3\begin{bmatrix}e^{2t}(-1%2Be^{2t})\\-e^{2t}(-1%2Be^{2t})\\e^{2t}(1%2Be^{2t})\end{bmatrix}

that is,


\begin{align}
x & = C_1(e^{2t}(1%2Be^{2t}-2t)) %2B C_2(-2te^{2t}) %2B C_3(e^{2t}(-1%2Be^{2t})) \\
y & = C_1(-e^{2t}(-1%2Be^{2t}-2t)) %2B C_2(2(t%2B1)e^{2t}) %2B C_3(-e^{2t}(-1%2Be^{2t})) \\
z & = C_1(e^{2t}(-1%2Be^{2t}%2B2t)) %2B C_2(2te^{2t}) %2B C_3(e^{2t}(1%2Be^{2t}))
\end{align}

Inhomogeneous case – variation of parameters

For the inhomogeneous case, we can use integrating factors (a method akin to variation of parameters). We seek a particular solution of the form yp(t) = exp(tA) z (t) :


\begin{align}
\mathbf{y}_p'(t) & = (e^{tA})'\mathbf{z}(t)%2Be^{tA}\mathbf{z}'(t) \\[6pt]
& = Ae^{tA}\mathbf{z}(t)%2Be^{tA}\mathbf{z}'(t) \\[6pt]
& = A\mathbf{y}_p(t)%2Be^{tA}\mathbf{z}'(t).
\end{align}

For yp to be a solution:


\begin{align}
e^{tA}\mathbf{z}'(t) & = \mathbf{b}(t) \\[6pt]
\mathbf{z}'(t) & = (e^{tA})^{-1}\mathbf{b}(t) \\[6pt]
\mathbf{z}(t) & = \int_0^t e^{-uA}\mathbf{b}(u)\,du%2B\mathbf{c}.
\end{align}

So,


\begin{align}
\mathbf{y}_p(t) & {} = e^{tA}\int_0^t e^{-uA}\mathbf{b}(u)\,du%2Be^{tA}\mathbf{c} \\
& {} = \int_0^t e^{(t-u)A}\mathbf{b}(u)\,du%2Be^{tA}\mathbf{c}
\end{align}

where c is determined by the initial conditions of the problem.

More precisely, consider the equation

Y'-A\ Y=F(t)

with the initial condition Y(t_0)=Y_0, where

A is an n by n complex matrix,

F is a continuous function from some open interval I to \mathbb{C}^n,

t_0 is a point of I, and

Y_0 is a vector of \mathbb{C}^n.

Left multiplying the above displayed equality by e^{-tA}, we get

Y(t)=e^{(t-t_0)A}\ Y_0%2B\int_{t_0}^t e^{(t-x)A}\ F(x)\ dx.

We claim that the solution to the equation

P(d/dt)\ y = f(t)

with the initial conditions y^{(k)}(t_0)=y_k for 0\le k<n is

y(t)=\sum_{k=0}^{n-1}\ y_k\ s_k(t-t_0)%2B\int_{t_0}^t s_{n-1}(t-x)\ f(x)\ dx,

where the notation is as follows:

P\in\mathbb{C}[X] is a monic polynomial of degree n>0,

f is a continuous complex valued function defined on some open interval I,

t_0 is a point of I,

y_k is a complex number, and

s_k(t) is the coefficient of X^k in the polynomial denoted by S_t\in\mathbb{C}[X] in Subsection Alternative above.

To justify this claim, we transform our order n scalar equation into an order one vector equation by the usual reduction to a first order system. Our vector equation takes the form

\frac{dY}{dt}-A\ Y=F(t),\quad Y(t_0)=Y_0,

where A is the transpose companion matrix of P. We solve this equation as explained above, computing the matrix exponentials by the observation made in Subsection Alternative above.

In the case n=2 we get the following statement. The solution to

y''-(\alpha%2B\beta)\ y'
%2B\alpha\,\beta\ y=f(t),\quad 
y(t_0)=y_0,\quad y'(t_0)=y_1\ .

is

y(t)=y_0\ s_0(t-t_0)%2By_1\ s_1(t-t_0)
%2B\int_{t_0}^t s_1(t-x)\,f(x)\ dx,

where the functions s_0 and s_1 are as in Subsection Alternative above.

Example (inhomogeneous)

Say we have the system

\begin{matrix}
x' &=& 2x & - & y & %2B & z & %2B & e^{2t} \\
y' &=&    &   & 3y& - & z & \\
z' &=& 2x & %2B & y & %2B & 3z & %2B & e^{2t}. \end{matrix}

So we then have

M= \left[ \begin{array}{rrr}
2 & -1 &  1 \\
0 &  3 & -1 \\
2 &  1 &  3 \end{array} \right]

and

\mathbf{b}=e^{2t}\begin{bmatrix}1 \\0\\1\end{bmatrix}.

From before, we have the general solution to the homogeneous equation, Since the sum of the homogeneous and particular solutions give the general solution to the inhomogeneous problem, now we only need to find the particular solution (via variation of parameters).

We have, above:

\mathbf{y}_p = e^{tA}\int_0^t e^{(-u)A}\begin{bmatrix}e^{2u} \\0\\e^{2u}\end{bmatrix}\,du%2Be^{tA}\mathbf{c}
\mathbf{y}_p = e^{tA}\int_0^t
\begin{bmatrix} 
     2e^u - 2ue^{2u} & -2ue^{2u}    & 0 \\  \\
-2e^u %2B 2(u%2B1)e^{2u} & 2(u%2B1)e^{2u} & 0 \\  \\
            2ue^{2u} & 2ue^{2u}     & 2e^u\end{bmatrix}\begin{bmatrix}e^{2u} \\0\\e^{2u}\end{bmatrix}\,du%2Be^{tA}\mathbf{c}
\mathbf{y}_p = e^{tA}\int_0^t
\begin{bmatrix}
e^{2u}( 2e^u - 2ue^{2u}) \\  \\
  e^{2u}(-2e^u %2B 2(1 %2B u)e^{2u}) \\  \\
  2e^{3u} %2B 2ue^{4u}\end{bmatrix}%2Be^{tA}\mathbf{c}
\mathbf{y}_p = e^{tA}\begin{bmatrix}
-{1 \over 24}e^{3t}(3e^t(4t-1)-16) \\  \\
{1 \over 24}e^{3t}(3e^t(4t%2B4)-16) \\  \\
{1 \over 24}e^{3t}(3e^t(4t-1)-16)\end{bmatrix}%2B
\begin{bmatrix} 
     2e^t - 2te^{2t} & -2te^{2t}    & 0 \\  \\
-2e^t %2B 2(t%2B1)e^{2t} & 2(t%2B1)e^{2t} & 0 \\  \\
            2te^{2t} & 2te^{2t}     & 2e^t\end{bmatrix}\begin{bmatrix}c_1 \\c_2 \\c_3\end{bmatrix}

which can be further simplified to get the requisite particular solution determined through variation of parameters.

See also

References

  1. ^ Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics. 169. Springer. 
  2. ^ E.H. Lieb (1973). "Convex trace functions and the Wigner–Yanase–Dyson conjecture". Adv. Math. 11: p. 267–288.  H. Epstein (1973). "Remarks on two theorems of E. Lieb". Commun Math. Phys. 31: p. 317–325. 
  3. ^ http://www.mathworks.de/help/techdoc/ref/expm.html
  4. ^ http://www.network-theory.co.uk/docs/octave3/octave_200.html
  5. ^ This can be generalized; in general, the exponential of Jn(a) is an upper triangular matrix with ea/0! on the main diagonal, ea/1! on the one above, ea/2! on the next one, and so on.

External links